Skip to content

feat: add MSSQL/Fabric support to data-parity skill#705

Open
suryaiyer95 wants to merge 18 commits intomainfrom
feat/mssql-fabric-data-parity
Open

feat: add MSSQL/Fabric support to data-parity skill#705
suryaiyer95 wants to merge 18 commits intomainfrom
feat/mssql-fabric-data-parity

Conversation

@suryaiyer95
Copy link
Copy Markdown
Contributor

@suryaiyer95 suryaiyer95 commented Apr 13, 2026

Summary

  • SQL Server driver: full connect/execute/listSchemas/listTables/describeTable/close with T-SQL TOP injection, sys.* catalog queries, and row flattening for unnamed columns
  • Azure AD authentication: 7 flows (default, password, access-token, service-principal-secret, msi-vm, msi-app-service) with shorthand aliases (cli, default, password, service-principal, msi)
  • Dialect mapping: sqlserver/mssqltsql, fabricfabric with DATETRUNC() and CONVERT(DATE, ..., 23) for date partitioning
  • mssql v12 upgrade: ConnectionPool isolation for concurrent connections, unnamed-column array flattening, synthetic column name fallback
  • SKILL.md documentation: Fabric connection config, Azure AD auth types, algorithm behavior, schema inspection queries
  • 28 driver unit tests + 9 dialect mapping tests

Key files

Area Files
Driver packages/drivers/src/sqlserver.ts, packages/drivers/test/sqlserver-unit.test.ts
Dialect mapping packages/opencode/src/altimate/native/connections/data-diff.ts
Tool registration packages/opencode/src/altimate/tools/data-diff.ts
Skill docs .opencode/skills/data-parity/SKILL.md

Test plan

  • bun test packages/drivers/test/sqlserver-unit.test.ts — 28 tests pass
  • bun test packages/opencode/test/altimate/data-diff-dialect.test.ts — 9 tests pass
  • Integration test against real MSSQL/Fabric instance
  • Verify Azure AD auth flows with actual Azure credentials

🤖 Generated with Claude Code


Summary by cubic

Adds full MSSQL and Microsoft Fabric support to the data-parity skill with Azure AD auth, dialect mapping, and a mssql v12 upgrade. Improves cross‑warehouse SQL handling and Azure auth reliability, including token caching, sovereign cloud support, and lockfile cleanup for @azure/identity.

  • New Features

    • Data-parity: dialect-aware identifier quoting in SQL‑query wrapping and per‑dialect mapping (sqlserver/mssqltsql, fabricfabric, postgresqlpostgres, mariadbmysql).
    • SQL Server driver: explicit ConnectionPool isolation, Azure AD access‑token cache with proactive refresh, automatic resource URL inference for Gov/China with azure_resource_url override, and async az CLI fallback; @azure/identity is no longer a peer dependency (bundled via tedious).
  • Bug Fixes

    • Cross-warehouse SQL‑query mode: inject side‑specific CTEs per task to prevent T‑SQL/Fabric from parsing the other side’s CTE.
    • Cross‑dialect partitioned diffs: build per‑side partition WHERE clauses and wrap as subqueries; normalize dates to ISO and cast with CONVERT(DATE, ..., 23) for T‑SQL/Fabric; escape single quotes.
    • Guardrails: disallow joindiff across different warehouses with a clear error suggesting hashdiff/auto.
    • Driver robustness: preserve named headers when flattening unnamed aggregate columns; fail fast on unknown Azure AD subtypes and guard non‑string authentication.
    • Diff output: show "(no values)" when a diff row has an empty values array.

Written for commit 3ebcec1. Summary will update on new commits.

Summary by CodeRabbit

  • New Features

    • Added Microsoft Fabric support as a recognized connection type and unified Fabric/SQL Server config aliases.
    • Expanded Azure AD authentication options (access-token, service-principal, MSI) with token caching and fallback behavior.
  • Bug Fixes

    • Improved result shaping, truncation reporting, and column mapping for complex query results.
    • Added cross-warehouse/dialect guards and dialect-aware partition/date SQL generation.
  • Documentation

    • Added Fabric/SQL Server connection and version guidance, and schema inspection examples.
  • Tests

    • Large suite of tests covering auth flows, Fabric behavior, dialect mapping, CTE injection, and diff logic.

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

Adds Microsoft Fabric support and SQL Server driver Azure AD token flows (cached tokens, @azure/identity + az CLI fallback), normalizes Fabric-related config fields, introduces warehouse→dialect mapping and dialect-aware partition SQL, updates mssql dependency, and adds extensive unit tests and documentation.

Changes

Cohort / File(s) Summary
Driver deps & config normalization
packages/drivers/package.json, packages/drivers/src/normalize.ts
Bumped optional mssql to ^12.0.0; expanded SQL Server–family alias map to normalize Azure auth fields (authentication, azure_*, access_token, azure_resource_url) and applied aliases to fabric.
SQL Server / Fabric driver
packages/drivers/src/sqlserver.ts, packages/drivers/test/sqlserver-unit.test.ts
Added module-scoped Azure AD token cache, token acquisition via @azure/identity with az CLI fallback, multiple Azure AD auth subtypes (password, access-token, MSI, service-principal, default), connection-pool creation, auth normalization, result shaping/truncation logic, and exported _resetTokenCacheForTests(). Added comprehensive unit tests covering auth flows, token caching, SQL rewriting, truncation, and schema introspection.
Registry & wiring
packages/opencode/src/altimate/native/connections/registry.ts, packages/opencode/test/altimate/connections.test.ts
Registered fabric@altimateai/drivers/sqlserver and included fabric in password-validated driver set; added registry test for fabric.
Data-diff orchestration
packages/opencode/src/altimate/native/connections/data-diff.ts, packages/opencode/test/altimate/data-diff-cte.test.ts, packages/opencode/test/altimate/data-diff-dialect.test.ts, packages/opencode/test/altimate/data-diff-cross-dialect.test.ts
Added WAREHOUSE_TO_DIALECT + warehouseTypeToDialect(), tightened SQL-query detection, extended resolveTableSources to return side-specific CTE prefixes, introduced dialect-aware quoting and partition SQL (T-SQL/Fabric DATETRUNC + CONVERT, Postgres DATE_TRUNC), changed per-partition wrapping to side-specific subqueries, and adjusted cross-warehouse detection/guards and CTE injection behavior. Added tests for CTE handling, dialect mapping, and cross-warehouse partitioned diffs.
Formatting & UX tweak
packages/opencode/src/altimate/tools/data-diff.ts
Render sample differences to show "(no values)" when d.values is an empty array.
Normalization tests
packages/opencode/test/altimate/driver-normalize.test.ts
Added normalize-config test validating fabric aliasing (serverhost, trustServerCertificatetrust_server_certificate).
Tests: CTE & cross-dialect
packages/opencode/test/altimate/data-diff-cte.test.ts, packages/opencode/test/altimate/data-diff-cross-dialect.test.ts, packages/opencode/test/altimate/data-diff-dialect.test.ts
New Bun test suites verifying resolveTableSources/injectCte behavior, dialect mapping, and cross-warehouse joindiff guard and partition SQL generation.
Documentation
.opencode/skills/data-parity/SKILL.md
Added SQL Server / Fabric schema-introspection snippet and Fabric-specific notes (DATETRUNC requirement, supported warehouses/auth mappings, connection params, algorithm↔dialect mapping).

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant Driver as SQL Server Driver
    participant AzureId as `@azure/identity`
    participant CLI as az CLI
    participant AzureAD as Azure AD
    participant MsSQL as mssql Library

    App->>Driver: connect(config with Azure AD auth)
    Driver->>Driver: normalize auth type & resolve resource URL
    alt `@azure/identity` available
        Driver->>AzureId: getToken(scope)
        AzureId->>AzureAD: request token
        AzureAD-->>AzureId: return token
        AzureId-->>Driver: access token
    else fallback to az CLI
        Driver->>CLI: exec("az account get-access-token --resource <res>")
        CLI->>AzureAD: request token
        AzureAD-->>CLI: return token
        CLI-->>Driver: token string
    end
    Driver->>MsSQL: new ConnectionPool(config with token)
    MsSQL-->>Driver: connected pool
    Driver-->>App: connected instance
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • mdesmet
  • kulvirgit

Poem

🐇 I hopped through configs, tokens in tow,
I fetched a scope where Fabrics grow,
From credential breeze to CLI song,
I stitched two dialects and hopped along,
A little rabbit cheers this code — hooray! 🎉

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning PR description missing the required 'PINEAPPLE' identifier at the top, as mandated by the repository template for AI-generated contributions. Add 'PINEAPPLE' as the first line of the PR description before all other content, as specified in the repository's description template for LLM/AI-generated contributions.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: add MSSQL/Fabric support to data-parity skill' accurately and concisely summarizes the main objective of the changeset: adding Microsoft SQL Server and Fabric support to the data-parity skill.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/mssql-fabric-data-parity

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 30 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/altimate/native/connections/data-diff.ts">

<violation number="1" location="packages/opencode/src/altimate/native/connections/data-diff.ts:266">
P2: Missing single-quote escaping on `partitionValue` in the date-mode branches. The categorical mode (6 lines above) escapes with `.replace(/'/g, "''")`, but none of the date-mode `switch` cases do. Apply the same escaping for consistency and defense in depth.</violation>

<violation number="2" location="packages/opencode/src/altimate/native/connections/data-diff.ts:489">
P2: `partitionColumn` is not identifier-quoted in `buildPartitionDiscoverySQL`, but it is quoted in `buildPartitionWhereClause`. If the column name is a reserved word (e.g. `date`, `order`), the discovery query will produce a syntax error. Quote the column consistently between both functions.</violation>
</file>

<file name="packages/opencode/src/altimate/native/connections/registry.ts">

<violation number="1" location="packages/opencode/src/altimate/native/connections/registry.ts:125">
P2: `fabric` is missing from the `PASSWORD_DRIVERS` set. Since it maps to the same sqlserver driver as `sqlserver`/`mssql` (both present in the set), `fabric` should also be included to get the same friendly error when `password` is not a string.</violation>
</file>

<file name="packages/opencode/src/altimate/tools/data-diff.ts">

<violation number="1" location="packages/opencode/src/altimate/tools/data-diff.ts:206">
P2: When `d.values` is nullish, `d.values?.join(" | ")` evaluates to `undefined`, which the template literal coerces to the string `"undefined"`. The output would read e.g. `[source only] undefined`. Use a fallback to produce a sensible message.</violation>
</file>

<file name="packages/drivers/src/sqlserver.ts">

<violation number="1" location="packages/drivers/src/sqlserver.ts:148">
P1: `flattenRow` flattens all array values, but only the empty-string key (`""`) holds mssql's merged unnamed columns. A legitimate array column value (e.g. from JSON aggregation) will be incorrectly spread, corrupting the row data and misaligning columns. Restrict flattening to the `""` key only.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread packages/drivers/src/sqlserver.ts Outdated
Comment thread packages/opencode/src/altimate/native/connections/data-diff.ts Outdated
Comment thread packages/opencode/src/altimate/native/connections/data-diff.ts
Comment thread packages/opencode/src/altimate/native/connections/registry.ts
Comment thread packages/opencode/src/altimate/tools/data-diff.ts Outdated
@suryaiyer95 suryaiyer95 force-pushed the feat/mssql-fabric-data-parity branch from 403477e to 811c2be Compare April 13, 2026 17:36
@anandgupta42
Copy link
Copy Markdown
Contributor

@suryaiyer95 can you address code review comments?

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/drivers/src/sqlserver.ts">

<violation number="1" location="packages/drivers/src/sqlserver.ts:90">
P2: Avoid `execSync` in the async connection path; it can block the event loop for up to the CLI timeout.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread packages/drivers/src/sqlserver.ts Outdated
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/altimate/native/connections/data-diff.ts">

<violation number="1" location="packages/opencode/src/altimate/native/connections/data-diff.ts:819">
P2: `crossWarehouse` is computed from unresolved params, so an omitted `source_warehouse` can be misclassified as cross-warehouse when both sides actually run on the same default warehouse.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread packages/opencode/src/altimate/native/connections/data-diff.ts Outdated
@suryaiyer95 suryaiyer95 changed the base branch from feat/data-parity-skill-improvements to main April 17, 2026 05:22
suryaiyer95 and others added 15 commits April 16, 2026 22:26
Bun's runtime never fires native addon async callbacks, so the async
`new duckdb.Database(path, opts, callback)` form would hit the 2-second
timeout fallback on every connection attempt.

Switch to the synchronous constructor form `new duckdb.Database(path)` /
`new duckdb.Database(path, opts)` which throws on error and completes
immediately in both Node and bun runtimes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The async callback form with 2s fallback was already working correctly
at e3df5a4. The timeout was caused by a missing duckdb .node binary,
not a bun incompatibility.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `warehouseTypeToDialect()` mapping: sqlserver→tsql, mssql→tsql,
  fabric→fabric, postgresql→postgres, mariadb→mysql. Fixes critical
  serde mismatch where Rust engine rejects raw warehouse type names.
- Update both `resolveDialect()` functions to use the mapping
- Add MSSQL/Fabric cases to `dateTruncExpr()` — DATETRUNC(DAY, col)
- Add locale-safe date literal casting via CONVERT(DATE, ..., 23)
- Register `fabric` in DRIVER_MAP (reuses sqlserver TDS driver)
- Add `fabric` normalize aliases in normalize.ts
- Add 15 SQL Server driver unit tests (TOP injection, truncation,
  schema introspection, connection lifecycle, result format)
- Add 9 dialect mapping unit tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Support all 7 Azure AD / Entra ID auth types in `sqlserver.ts`:
  `azure-active-directory-password`, `access-token`, `service-principal-secret`,
  `msi-vm`, `msi-app-service`, `azure-active-directory-default`, `token-credential`
- Force TLS encryption for all Azure AD connections
- Dynamic import of `@azure/identity` for `DefaultAzureCredential`
- Add normalize aliases for Azure AD config fields (`authentication`,
  `azure_tenant_id`, `azure_client_id`, `azure_client_secret`, `access_token`)
- Add `fabric: SQLSERVER_ALIASES` to DRIVER_ALIASES
- Add 10 Azure AD unit tests covering all auth flows, encryption,
  and `DefaultAzureCredential` with managed identity

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…LL.md

- Add SQL Server / Fabric schema inspection query in Step 2
- Add "SQL Server and Microsoft Fabric" section with:
  - Supported configurations table (sqlserver, mssql, fabric)
  - Fabric connection guide with Azure AD auth types
  - Algorithm behavior notes (joindiff vs hashdiff selection)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rscore column filter

- **Azure AD auth**: Pass `azure-active-directory-*` types directly to tedious
  instead of constructing `DefaultAzureCredential` ourselves. Tedious imports
  `@azure/identity` internally and creates credentials — avoids bun CJS/ESM
  `isTokenCredential` boundary issue that caused "not an instance of the token
  credential class" errors.
- **Auth shorthands**: Map `CLI`, `default`, `password`, `service-principal`,
  `msi`, `managed-identity` to their full tedious type names.
- **Column filter**: Remove `_.startsWith("_")` filter from `execute()` result
  columns — it stripped legitimate aliases like `_p` used by partition discovery,
  causing partitioned diffs to return empty results.
- **Tests**: Remove `@azure/identity` mock (no longer imported by driver),
  update auth assertions, add shorthand mapping tests, fix column filter test.
- **Verified**: All 97 driver tests pass. Full data-diff pipeline tested against
  real MSSQL server (profile, joindiff, auto, where_clause, partitioned).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lattening

- Upgrade `mssql` from v11 to v12 (`tedious` 18 → 19)
- Use explicit `ConnectionPool` instead of global `mssql.connect()` to
  isolate multiple simultaneous connections
- Flatten unnamed column arrays — `mssql` merges unnamed columns (e.g.
  `SELECT COUNT(*), SUM(...)`) into a single array under the empty-string
  key; restore positional column values
- Proper column name resolution: compare `namedKeys.length` against
  flattened row length, fall back to synthetic `col_0`, `col_1`, etc.
- Update test mock to export `ConnectionPool` class and `createMockPool`

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tions

Use ternary expressions (`x ? {...} : {}`) instead of short-circuit
(`x && {...}`) to avoid spreading a boolean value.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- P1: restrict `flattenRow` to only spread the empty-string key (`""`)
  where mssql merges unnamed columns, preserving legitimate array values
- P2: escape single quotes in `partitionValue` for date-mode branches in
  `buildPartitionWhereClause` (categorical mode already escaped)
- P2: add `fabric` to `PASSWORD_DRIVERS` set in registry for consistent
  password validation alongside `sqlserver`/`mssql`
- P2: fallback to `"(no values)"` when `d.values` is nullish to prevent
  template literal coercing `undefined` to the string `"undefined"`

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- sqlserver-unit: 3 tests for unnamed column flattening — verifies only
  the empty-string key is spread, legitimate named arrays are preserved
- driver-normalize: fabric type uses SQLSERVER_ALIASES (server → host,
  trustServerCertificate → trust_server_certificate)
- connections: fabric type is recognized in DRIVER_MAP and listed correctly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add "Minimum Version Requirements" table to SKILL.md covering SQL Server
  2022+, mssql v12, and @azure/identity v4 with rationale for each
- Document auth shorthands (CLI, default, password, service-principal, msi)
- Move @azure/identity from dependencies to optional peerDependencies so
  it is NOT installed by default — only required for Azure AD auth
- Add runtime check in sqlserver driver: if Azure AD auth type is requested
  but @azure/identity is missing, throw a clear install instruction error

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…solution

- For `azure-active-directory-default` (CLI/default auth), acquire token
  ourselves instead of delegating to tedious's internal `@azure/identity`
- Strategy: try `DefaultAzureCredential` first, fall back to `az` CLI subprocess
- Bypasses Bun resolving `@azure/identity` to browser bundle where
  `DefaultAzureCredential` is a non-functional stub
- Also bypasses CJS/ESM `isTokenCredential` boundary mismatch
- All 31 driver unit tests pass, verified against real Fabric endpoint

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…oken` when none supplied

The `azure-active-directory-access-token` branch passed
`token: config.token ?? config.access_token` to tedious. When neither
field was set on a connection (e.g. a `fabric-migration` entry that
declared the auth type but no token), tedious threw:

    TypeError: The "config.authentication.options.token" property
    must be of type string

This blocked any Fabric/MSSQL config that relied on ambient credentials
(Azure CLI / managed identity) but used the explicit
`azure-active-directory-access-token` type instead of the `default`
shorthand.

Refactor token acquisition (`DefaultAzureCredential` → `az` CLI
fallback) into a shared `acquireAzureToken()` helper used by both the
`default` path and the `access-token` path when no token was supplied.
Callers that pass an explicit token are unchanged.

Also harden `mock.module("node:child_process", ...)` in
`sqlserver-unit.test.ts` to spread the real module so sibling tests in
the same `bun test` run keep access to `spawn` / `exec` / `fork`.

Tests: 110 pass, 0 fail in `packages/drivers`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ry mode

When `source` and `target` are both SQL queries, `resolveTableSources` wraps
them as `__diff_source` / `__diff_target` CTEs and the executor prepends the
combined `WITH …` block to every engine-emitted task. T-SQL and Fabric
parse-bind every CTE body even when unreferenced, so a task routed to the
source warehouse failed to resolve the target-only base table referenced
inside the unused `__diff_target` CTE (and vice versa), producing
`Invalid object name` errors from the wrong warehouse.

Return side-specific prefixes from `resolveTableSources` alongside the
combined one, and have the executor loop in `runDataDiff` pick the source
or target prefix per task when `source_warehouse !== target_warehouse`.
Same-warehouse behaviour is unchanged.

Adds `data-diff-cte.test.ts` covering plain-name passthrough, both-query
wrapping, side-specific CTE isolation, and CTE merging with engine-emitted
`WITH` clauses (10 tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 333a45c moved `@azure/identity` from `optionalDependencies` to
`peerDependencies` with `optional: true` in `packages/drivers/package.json`,
but the lockfile was not regenerated. That left CI under `--frozen-lockfile`
broken and made fresh installs silently diverge from the committed state.

Running `bun install` brings the lockfile in sync: `@azure/identity` is
recorded as an optional peer, and its transitive pins (`@azure/msal-browser`,
`@azure/msal-common`, `@azure/msal-node`) re-resolve to the versions required
by `tedious` and `snowflake-sdk`, matching the reachable runtime surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@suryaiyer95 suryaiyer95 force-pushed the feat/mssql-fabric-data-parity branch from 52fe09e to 1977232 Compare April 17, 2026 05:26
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (4)
packages/opencode/src/altimate/native/connections/data-diff.ts (1)

815-820: Cross-warehouse detection is a purely textual compare.

sourceWarehouse !== targetWarehouse treats undefined vs an explicit name as "cross-warehouse" even when the explicit name is the default (first) warehouse. The side-specific CTE injection is a strict superset of the combined prefix, so correctness isn't affected — just a minor inefficiency (slightly narrower CTE sent). Not blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/native/connections/data-diff.ts` around lines
815 - 820, Summary: crossWarehouse currently does a raw textual compare
(sourceWarehouse !== targetWarehouse) which can misclassify undefined vs an
explicit default warehouse name as cross-warehouse. Fix: canonicalize both sides
before comparing by resolving undefined/empty values to the effective/default
warehouse name used elsewhere (use the same resolver used for
connection/effective settings) instead of comparing raw params.source_warehouse
and params.target_warehouse; update the variables referenced (sourceWarehouse,
targetWarehouse, crossWarehouse) so crossWarehouse = resolvedSourceWarehouse !==
resolvedTargetWarehouse. Ensure you call the existing resolution helper (or add
one) rather than inventing a new default string.
packages/drivers/test/sqlserver-unit.test.ts (1)

483-489: Assertion gap lets the "mixed named + unnamed" column-naming bug slip through.

This test verifies rows for { name: "alice", "": [42] } but doesn't assert result.columns. Under the current implementation (sqlserver.ts Lines 210-217), mixing a named column with multiple unnamed values (e.g. { name: "alice", "": [42, 100] }) collapses the header to ["col_0", "col_1", "col_2"] and drops name. Adding a case with ≥ 2 unnamed values and asserting columns would have caught that and will guard whichever fix lands.

♻️ Suggested additional case
     test("handles mix of named and unnamed columns", async () => {
       mockQueryResult = {
         recordset: [{ name: "alice", "": [42] }],
       }
       const result = await connector.execute("SELECT * FROM t")
       expect(result.rows).toEqual([["alice", 42]])
     })
+
+    test("preserves named-column header when mixed with multiple unnamed columns", async () => {
+      // SELECT name, COUNT(*), SUM(x) FROM t → { name: "alice", "": [42, 100] }
+      mockQueryResult = {
+        recordset: [{ name: "alice", "": [42, 100] }],
+      }
+      const result = await connector.execute("SELECT name, COUNT(*), SUM(x) FROM t")
+      expect(result.rows).toEqual([["alice", 42, 100]])
+      expect(result.columns[0]).toBe("name")
+    })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/test/sqlserver-unit.test.ts` around lines 483 - 489, Add a
new test variant in packages/drivers/test/sqlserver-unit.test.ts that uses a
recordset like { name: "alice", "": [42, 100] } (i.e., multiple unnamed values)
and assert both result.rows and result.columns so the header preserves the named
column "name" and includes generated column names for the unnamed values (e.g.,
["name","col_1","col_2"]); this will surface the bug referenced around
sqlserver.ts (logic near lines handling column mapping) and prevent regressions
in connector.execute's column assembly.
packages/opencode/test/altimate/data-diff-dialect.test.ts (1)

51-54: Consider expanding case-insensitivity coverage.

The case-insensitivity block only exercises sqlserver and postgresql. Since mssql, fabric, and mariadb go through the same normalization path, a one-line additional assertion (e.g., MSSQLtsql, FABRICfabric) would guard against a future regression where only some branches get lowercased.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/test/altimate/data-diff-dialect.test.ts` around lines 51 -
54, Add assertions to the "handles uppercase input" test to cover other inputs
that share the normalization path so case-insensitivity can't regress: update
the test block in packages/opencode/test/altimate/data-diff-dialect.test.ts that
calls warehouseTypeToDialect to also assert uppercase variants such as "MSSQL"
-> "tsql", "FABRIC" -> "fabric" and "MARIADB" -> "mariadb" (or whichever
expected dialects map in warehouseTypeToDialect) so all branches that are
lowercased are exercised.
packages/drivers/src/sqlserver.ts (1)

89-100: execSync blocks the event loop during connect().

execSync("az account get-access-token ...") with a 15 s timeout runs synchronously on the main thread inside an async function — any other pending I/O on the runtime is stalled while Azure CLI spins up (typically 0.5–2 s, occasionally much longer). Prefer the async execFile (or promisify(exec)) so the token fetch cooperates with the event loop.

♻️ Suggested change
-            try {
-              const { execSync } = await import("node:child_process")
-              const out = execSync(
-                "az account get-access-token --resource https://database.windows.net/ --query accessToken -o tsv",
-                { encoding: "utf-8", timeout: 15000, stdio: ["pipe", "pipe", "pipe"] },
-              ).trim()
-              if (out) token = out
-            } catch {
-              // az CLI not installed or not logged in
-            }
+            try {
+              const { execFile } = await import("node:child_process")
+              const { promisify } = await import("node:util")
+              const run = promisify(execFile)
+              const { stdout } = await run(
+                "az",
+                ["account", "get-access-token", "--resource", "https://database.windows.net/", "--query", "accessToken", "-o", "tsv"],
+                { encoding: "utf-8", timeout: 15000 },
+              )
+              const out = stdout.trim()
+              if (out) token = out
+            } catch {
+              // az CLI not installed or not logged in
+            }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/src/sqlserver.ts` around lines 89 - 100, The current token
fetch inside connect() uses sync child process invocation (execSync) which
blocks the event loop; replace it with an asynchronous call (e.g., use
child_process.execFile or promisified exec via util.promisify) so the Azure CLI
call runs without blocking. Locate the block that imports execSync and sets
token (the token variable and the try/catch that calls "az account
get-access-token ...") and change it to call the async API (await the
execFile/promisified exec) with the same args and timeout, handle stdout.trim()
to set token, and preserve the existing catch behavior for missing CLI or auth
failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.opencode/skills/data-parity/SKILL.md:
- Around line 453-458: The fenced code block showing the pseudo-YAML config (the
block containing type: "fabric", host:
"<workspace-id>-<item-id>.datawarehouse.fabric.microsoft.com", database:
"<warehouse-name>", authentication: "azure-active-directory-default") is missing
a language tag; change the opening fence from ``` to ```yaml (or ```text) so
markdownlint MD040 is satisfied and the block is treated as YAML.

In `@packages/drivers/package.json`:
- Around line 20-33: Remove the redundant optional peer dependency entry for
"@azure/identity" from package.json: delete the "@azure/identity" key inside the
"peerDependencies" object and its corresponding entry in "peerDependenciesMeta"
since "mssql"/"tedious" already bundles it; update the package.json so only the
other dependencies (e.g., "mssql", "oracledb", "duckdb", "mongodb",
"@clickhouse/client") remain and no "@azure/identity" references exist under
"peerDependencies" or "peerDependenciesMeta".

In `@packages/drivers/src/sqlserver.ts`:
- Around line 53-54: The code calls rawAuth.toLowerCase() without ensuring
rawAuth is a string, causing a TypeError for non-string authentication values;
update the computation of authType in the sqlserver driver to first check typeof
rawAuth === "string" before using toLowerCase and looking up AUTH_SHORTHANDS,
e.g. only apply AUTH_SHORTHANDS[rawAuth.toLowerCase()] when rawAuth is a string,
otherwise pass rawAuth through (or leave undefined) so non-string auth blocks
(config.authentication) don't blow up at runtime; adjust the logic around
rawAuth, authType, and AUTH_SHORTHANDS to be type-safe and explicit.
- Around line 56-158: The code path that handles authType starting with
"azure-active-directory" (see authType variable and mssqlConfig.authentication
assignments) does not handle unknown subtypes and can leave mssqlConfig without
authentication; add a terminal else branch after the existing
azure-active-directory-* branches that throws a clear Error (e.g. mentioning the
unsupported authType) so unknown or misspelled Azure AD subtypes fail fast;
ensure the thrown error references authType so callers get an explicit message
instead of a later opaque tedious error.
- Around line 76-87: The empty catch around the `@azure/identity` import and
credential.getToken is hiding real authentication errors; change the catch to
capture the thrown error (e.g., catch (err) { azureIdentityErr = err }) and
preserve that error in a local variable, then either append its message/details
to the final thrown error or log it at debug level when token remains undefined;
update the block that currently throws the generic "install `@azure/identity` or
run az login" message to include the captured azureIdentityErr (or its
message/stack) so real AAD/authentication failures are visible while still
tolerating missing-module/browser-bundle cases.
- Around line 208-217: The current columns logic replaces known named headers
when sampleFlat is longer (mixed named + unnamed aggregates); change the
construction in the columns branch to preserve the existing named prefix
(namedKeys) and only synthesize column names for the extra flattened entries
(e.g., generate col_i for indices >= namedKeys.length up to
sampleFlat.length-1), falling back to result.recordset.columns only when both
namedKeys and sampleFlat are empty; update the conditional around
sampleFlat/namedKeys where columns is assigned (the expression that uses
sampleFlat, namedKeys, recordset, and result.recordset.columns) so known headers
are retained and only the tail is renamed.
- Around line 162-167: Remove the fallback to the global shared pool to avoid
reintroducing cross-database isolation; in the block that currently checks
MssqlConnectionPool, ensure that if MssqlConnectionPool is falsy you throw an
explicit error instead of calling mssql.connect, i.e., keep the branch that
constructs and connects a new MssqlConnectionPool(mssqlConfig) and replace the
else branch with a throw that references MssqlConnectionPool and mssql.connect
in the message so reviewers can quickly locate and understand the change.

In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 154-159: The test "empty result returns correctly" uses a fake
mssql response shape by setting mockQueryResult.recordset_columns, but the
driver reads result.recordset?.columns; update the test to match real mssql
shape by removing recordset_columns and instead set mockQueryResult.recordset =
[] and, if you need to include column metadata, set
mockQueryResult.recordset.columns = {} (or leave absent for empty results) so
the driver code that checks result.recordset?.columns and the code paths in
connector.execute behave against the real response shape.

---

Nitpick comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 89-100: The current token fetch inside connect() uses sync child
process invocation (execSync) which blocks the event loop; replace it with an
asynchronous call (e.g., use child_process.execFile or promisified exec via
util.promisify) so the Azure CLI call runs without blocking. Locate the block
that imports execSync and sets token (the token variable and the try/catch that
calls "az account get-access-token ...") and change it to call the async API
(await the execFile/promisified exec) with the same args and timeout, handle
stdout.trim() to set token, and preserve the existing catch behavior for missing
CLI or auth failures.

In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 483-489: Add a new test variant in
packages/drivers/test/sqlserver-unit.test.ts that uses a recordset like { name:
"alice", "": [42, 100] } (i.e., multiple unnamed values) and assert both
result.rows and result.columns so the header preserves the named column "name"
and includes generated column names for the unnamed values (e.g.,
["name","col_1","col_2"]); this will surface the bug referenced around
sqlserver.ts (logic near lines handling column mapping) and prevent regressions
in connector.execute's column assembly.

In `@packages/opencode/src/altimate/native/connections/data-diff.ts`:
- Around line 815-820: Summary: crossWarehouse currently does a raw textual
compare (sourceWarehouse !== targetWarehouse) which can misclassify undefined vs
an explicit default warehouse name as cross-warehouse. Fix: canonicalize both
sides before comparing by resolving undefined/empty values to the
effective/default warehouse name used elsewhere (use the same resolver used for
connection/effective settings) instead of comparing raw params.source_warehouse
and params.target_warehouse; update the variables referenced (sourceWarehouse,
targetWarehouse, crossWarehouse) so crossWarehouse = resolvedSourceWarehouse !==
resolvedTargetWarehouse. Ensure you call the existing resolution helper (or add
one) rather than inventing a new default string.

In `@packages/opencode/test/altimate/data-diff-dialect.test.ts`:
- Around line 51-54: Add assertions to the "handles uppercase input" test to
cover other inputs that share the normalization path so case-insensitivity can't
regress: update the test block in
packages/opencode/test/altimate/data-diff-dialect.test.ts that calls
warehouseTypeToDialect to also assert uppercase variants such as "MSSQL" ->
"tsql", "FABRIC" -> "fabric" and "MARIADB" -> "mariadb" (or whichever expected
dialects map in warehouseTypeToDialect) so all branches that are lowercased are
exercised.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6589a3e0-5c1a-4ac6-bd8a-e405334413ca

📥 Commits

Reviewing files that changed from the base of the PR and between c438bb0 and 1977232.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • .opencode/skills/data-parity/SKILL.md
  • packages/drivers/package.json
  • packages/drivers/src/normalize.ts
  • packages/drivers/src/sqlserver.ts
  • packages/drivers/test/sqlserver-unit.test.ts
  • packages/opencode/src/altimate/native/connections/data-diff.ts
  • packages/opencode/src/altimate/native/connections/registry.ts
  • packages/opencode/src/altimate/tools/data-diff.ts
  • packages/opencode/test/altimate/connections.test.ts
  • packages/opencode/test/altimate/data-diff-cte.test.ts
  • packages/opencode/test/altimate/data-diff-dialect.test.ts
  • packages/opencode/test/altimate/driver-normalize.test.ts

Comment thread .opencode/skills/data-parity/SKILL.md Outdated
Comment thread packages/drivers/package.json
Comment thread packages/drivers/src/sqlserver.ts Outdated
Comment thread packages/drivers/src/sqlserver.ts
Comment thread packages/drivers/src/sqlserver.ts
Comment thread packages/drivers/src/sqlserver.ts Outdated
Comment thread packages/drivers/src/sqlserver.ts Outdated
Comment thread packages/drivers/test/sqlserver-unit.test.ts
Comment thread packages/opencode/src/altimate/native/connections/data-diff.ts
Fixes five correctness, reliability, and portability issues surfaced by the
consensus code review of this branch.

CRITICAL #1 — Cross-dialect partitioned diff (`data-diff.ts`):
`runPartitionedDiff` built one partition WHERE clause with `sourceDialect`
and passed it as shared `where_clause` to the recursive `runDataDiff`, which
applied it to both warehouses identically. Cross-dialect partition mode
(MSSQL → Postgres) failed because the target received T-SQL
`DATETRUNC`/`CONVERT(DATE, …, 23)`. Now builds per-side WHERE using each
warehouse's dialect and bakes it into dialect-quoted subquery SQL for source
and target independently. The existing side-aware CTE injection handles the
rest.

MAJOR #2 — Azure AD token caching and refresh (`sqlserver.ts`):
`acquireAzureToken` fetched a fresh token on every `connect()` and embedded
it in the pool config with no refresh. Long-lived sessions silently failed
when the ~1h token expired. Adds a module-scoped cache keyed by
`(resource, client_id)` with proactive refresh 5 min before expiry, parsing
`expiresOnTimestamp` from `@azure/identity` or the JWT `exp` claim from the
`az` CLI fallback. Exposes `_resetTokenCacheForTests` for isolation.

MAJOR #3 — `joindiff` + cross-warehouse guard (`data-diff.ts`):
Explicit `algorithm: "joindiff"` combined with different warehouses produced
broken SQL (one task referencing two CTE aliases with only one injected).
Now returns an early error with a clear message steering users to `hashdiff`
or `auto`. Cross-warehouse detection switched from warehouse-name string
compare to dialect compare, matching the underlying SQL-divergence invariant.

MAJOR #4 — Dialect-aware identifier quoting in CTE wrapping (`data-diff.ts`):
`resolveTableSources` wrapped plain-table names with ANSI double-quotes for
all dialects. T-SQL/Fabric require `QUOTED_IDENTIFIER ON` for this to work;
default for `mssql`/tedious is ON, but user contexts (stored procs, legacy
collations) can override. Now accepts source/target dialect parameters and
delegates to `quoteIdentForDialect`, which was hoisted to module scope so
it can be reused across partition and CTE paths.

MAJOR #5 — Configurable Azure resource URL (`sqlserver.ts`, `normalize.ts`):
Token acquisition hardcoded `https://database.windows.net/`, blocking Azure
Government, Azure China, and sovereign-cloud customers. Now honours an
explicit `azure_resource_url` config field and otherwise infers the URL
from the host suffix (`.usgovcloudapi.net`, `.chinacloudapi.cn`). Adds the
usual camelCase/snake_case aliases in the SQL Server normalizer.

Also surfaces Azure auth error causes: if both `@azure/identity` and `az`
CLI fail, the thrown error includes both hints (redacted) so users know why
rather than seeing the generic "install @azure/identity or run az login"
message.

Tests: adds `data-diff-cross-dialect.test.ts` covering the cross-dialect
partition WHERE routing and the `joindiff` guard; extends
`data-diff-cte.test.ts` with dialect-aware quoting assertions for tsql,
fabric, and mysql; extends `sqlserver-unit.test.ts` with cache hit / expiry
refresh / client-id keyed cache tests, commercial/gov/china/custom resource
URL resolution, and the combined-error-hints surface.

All 41 sqlserver driver tests, 24 data-diff orchestrator tests, and 214
normalize/connections tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 17, 2026 17:54
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds first-class SQL Server (MSSQL) and Microsoft Fabric support to the data-parity/data-diff workflow, including dialect mapping to Rust engine dialect names, cross-warehouse-safe CTE injection, and expanded driver authentication/normalization/docs.

Changes:

  • Add SQL Server driver capabilities (Azure AD auth flows, TOP injection, schema introspection, result shaping) and upgrade mssql to v12.
  • Add warehouse-type → Rust dialect mapping (sqlserver/mssqltsql, fabricfabric, etc.) plus dialect-aware quoting and side-specific CTE injection for SQL-query mode and partitioned diffs.
  • Extend registry/normalization/tests/docs to recognize fabric as a supported connection type routed through the SQL Server driver.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
packages/opencode/test/altimate/driver-normalize.test.ts Adds coverage ensuring fabric configs normalize using SQL Server aliasing.
packages/opencode/test/altimate/data-diff-dialect.test.ts Adds tests for warehouse-type → Rust dialect mapping.
packages/opencode/test/altimate/data-diff-cte.test.ts Adds tests for query detection, dialect-aware quoting, and side-specific CTE injection.
packages/opencode/test/altimate/data-diff-cross-dialect.test.ts Adds tests for cross-dialect partition filtering and joindiff cross-warehouse guard behavior.
packages/opencode/test/altimate/connections.test.ts Adds coverage for fabric being recognized/registered as a connection type.
packages/opencode/src/altimate/tools/data-diff.ts Adjusts diff output formatting for missing diff row values.
packages/opencode/src/altimate/native/connections/registry.ts Adds fabric → sqlserver driver routing.
packages/opencode/src/altimate/native/connections/data-diff.ts Implements dialect mapping, dialect-aware quoting, side-specific CTE injection, and cross-dialect partition handling.
packages/drivers/test/sqlserver-unit.test.ts Adds extensive unit tests for the SQL Server driver, auth flows, and result shaping.
packages/drivers/src/sqlserver.ts Implements SQL Server driver with Azure AD token acquisition/caching, ConnectionPool isolation, and result shaping.
packages/drivers/src/normalize.ts Adds SQL Server/Fabric config alias normalization for auth and Azure fields.
packages/drivers/package.json Upgrades mssql dependency and adds optional peer dependency on @azure/identity.
bun.lock Updates lockfile for mssql/tedious upgrade and peer dependency metadata.
.opencode/skills/data-parity/SKILL.md Documents SQL Server/Fabric configuration, requirements, and behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

for (const d of diffRows.slice(0, 5)) {
const label = d.sign === "-" ? "source only" : "target only"
lines.push(` [${label}] ${d.values?.join(" | ")}`)
lines.push(` [${label}] ${d.values?.join(" | ") ?? "(no values)"}`)
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new (no values) fallback won’t trigger for values: [] because [].join(" | ") returns an empty string (not null/undefined). If the intent is to show the placeholder when there are zero values, check d.values?.length (or similar) instead of using ?? on the joined string.

Suggested change
lines.push(` [${label}] ${d.values?.join(" | ") ?? "(no values)"}`)
const values = d.values?.length ? d.values.join(" | ") : "(no values)"
lines.push(` [${label}] ${values}`)

Copilot uses AI. Check for mistakes.
Comment thread packages/drivers/src/sqlserver.ts Outdated
Comment on lines +302 to +312
const rows = limitedRecordset.map(flattenRow)
const sampleFlat = rows.length > 0 ? rows[0] : []
const namedKeys = recordset.length > 0 ? Object.keys(recordset[0]) : []
const columns =
rows.length > 0
? Object.keys(rows[0]).filter((k) => !k.startsWith("_"))
: (result.recordset?.columns
? Object.keys(result.recordset.columns)
: [])
const truncated = effectiveLimit > 0 && rows.length > effectiveLimit
const limitedRows = truncated ? rows.slice(0, effectiveLimit) : rows
namedKeys.length === sampleFlat.length
? namedKeys
: sampleFlat.length > 0
? sampleFlat.map((_: any, i: number) => `col_${i}`)
: (result.recordset?.columns
? Object.keys(result.recordset.columns)
: [])
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Column name derivation becomes inconsistent when the mssql driver returns unnamed columns under the empty-string key ({"": [..]}) and there are named columns. With { name: "alice", "": [42] }, namedKeys.length === sampleFlat.length so columns becomes ["name", ""], leaving an empty column name and making it ambiguous/misaligned for callers. Consider building columns in the same loop as flattenRow (expanding the empty-string key into synthetic col_N names) so columns.length always matches rows[i].length and never contains "".

Copilot uses AI. Check for mistakes.
Comment on lines +865 to +876
// Cross-warehouse mode requires side-specific CTE injection: T-SQL / Fabric
// parse-bind every CTE body even when unreferenced, so sending the combined
// prefix to a warehouse that lacks the other side's base table fails at parse.
// Detect by dialect compare — more robust than warehouse-name compare, and
// matches the underlying invariant that we care about (SQL-text divergence).
const crossWarehouse = dialect1 !== dialect2

// Explicit JoinDiff cannot work across warehouses: it emits one FULL OUTER
// JOIN task referencing both CTE aliases, but side-aware injection only
// defines one side per task — the other alias would be unresolved. Guard
// early so users get a clear error instead of an obscure SQL parse failure.
if (params.algorithm === "joindiff" && crossWarehouse) {
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crossWarehouse is computed from dialect1 !== dialect2, but cross-warehouse behavior/guards should depend on whether source_warehouse and target_warehouse are different connections. If both sides are SQL Server (same dialect) but different warehouses, this will incorrectly allow algorithm: "joindiff" and can route a JOIN query to a warehouse that cannot resolve the other side’s table. Compute cross-warehouse based on warehouse identity (e.g. params.target_warehouse && params.target_warehouse !== params.source_warehouse) and keep dialect comparison only for SQL-text/dialect branching.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
packages/drivers/src/sqlserver.ts (2)

255-262: ⚠️ Potential issue | 🟡 Minor

Fallback to mssql.connect() still defeats per-pool isolation.

This was raised in a prior review and remains unresolved. Since packages/drivers/package.json now pins mssql@^12.0.0 where ConnectionPool is a stable named export, MssqlConnectionPool should never be falsy in practice; if it somehow is, silently using the global shared pool re-introduces the cross-database interference bug the explicit ConnectionPool path was added to fix. Throw instead.

🔒️ Proposed fix
-      if (MssqlConnectionPool) {
-        pool = new MssqlConnectionPool(mssqlConfig)
-        await pool.connect()
-      } else {
-        pool = await mssql.connect(mssqlConfig)
-      }
+      if (!MssqlConnectionPool) {
+        throw new Error(
+          "mssql.ConnectionPool is not available — the installed `mssql` package is too old. Upgrade to mssql@^12.",
+        )
+      }
+      pool = new MssqlConnectionPool(mssqlConfig)
+      await pool.connect()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/src/sqlserver.ts` around lines 255 - 262, The fallback
branch that calls mssql.connect() reintroduces the global shared pool bug;
replace the fallback with a hard failure instead of silently using the global
pool: when MssqlConnectionPool is falsy, throw a descriptive error indicating
ConnectionPool is required (referencing MssqlConnectionPool, ConnectionPool, and
the pool initialization logic) so callers know the driver cannot operate without
the explicit ConnectionPool; do not call mssql.connect() anywhere in this
module.

302-312: ⚠️ Potential issue | 🟡 Minor

Mixed named + multi-unnamed columns still drop named headers.

Already flagged in an earlier review and not addressed. For SELECT name, COUNT(*), SUM(x) FROM t (recordset row { name: "alice", "": [42, 100] }), namedKeys.length is 2 but sampleFlat.length is 3, so the branch falls through to synthesizing col_0..col_2, losing the real name header. Preserve the known prefix and only synthesize names for the flattened tail.

♻️ Proposed fix
-      const rows = limitedRecordset.map(flattenRow)
-      const sampleFlat = rows.length > 0 ? rows[0] : []
-      const namedKeys = recordset.length > 0 ? Object.keys(recordset[0]) : []
-      const columns =
-        namedKeys.length === sampleFlat.length
-          ? namedKeys
-          : sampleFlat.length > 0
-            ? sampleFlat.map((_: any, i: number) => `col_${i}`)
-            : (result.recordset?.columns
-                ? Object.keys(result.recordset.columns)
-                : [])
+      const rows = limitedRecordset.map(flattenRow)
+      const sampleFlat = rows.length > 0 ? rows[0] : []
+      const sampleRow = recordset[0] ?? {}
+      const columns: string[] = []
+      if (sampleFlat.length === 0) {
+        if (result.recordset?.columns) columns.push(...Object.keys(result.recordset.columns))
+      } else {
+        let i = 0
+        for (const [k, v] of Object.entries(sampleRow)) {
+          if (k === "" && Array.isArray(v)) {
+            for (let j = 0; j < v.length; j++) columns.push(`col_${i++}`)
+          } else {
+            columns.push(k === "" ? `col_${i}` : k)
+            i++
+          }
+        }
+      }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/src/sqlserver.ts` around lines 302 - 312, The code currently
drops known named headers when some columns are flattened (variables:
limitedRecordset, flattenRow, rows, sampleFlat, namedKeys, columns), so change
the columns construction to preserve the namedKeys prefix and only synthesize
names for any extra flattened entries: if namedKeys.length === sampleFlat.length
use namedKeys; if namedKeys.length < sampleFlat.length set columns =
[...namedKeys, ...sampleFlat.slice(namedKeys.length).map((_, i) =>
`col_${namedKeys.length + i}`)]; otherwise (sampleFlat shorter) keep the current
fallback logic that uses sampleFlat or result.recordset?.columns as before.
Ensure this logic replaces the current ternary chain building columns.
packages/drivers/test/sqlserver-unit.test.ts (1)

176-181: ⚠️ Potential issue | 🟡 Minor

recordset_columns is still not a real mssql shape.

Previously flagged; the key remains incorrect. mssql exposes metadata at result.recordset.columns, not as a sibling recordset_columns, and the driver (see sqlserver.ts line 310) reads result.recordset?.columns. The test passes only because recordset is empty, so future readers will copy the wrong pattern.

💚 Proposed fix
-      mockQueryResult = { recordset: [], recordset_columns: {} }
+      const recordset: any = []
+      recordset.columns = {}
+      mockQueryResult = { recordset }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/test/sqlserver-unit.test.ts` around lines 176 - 181, The
test constructs a fake mssql result with the wrong metadata shape—using
mockQueryResult.recordset_columns instead of the real mssql shape
result.recordset.columns—so update the test ("empty result returns correctly" in
packages/drivers/test/sqlserver-unit.test.ts) to build mockQueryResult with
recordset: [] and an empty columns object under recordset.columns (i.e.,
mockQueryResult.recordset = Object.assign([], { columns: {} })) so the driver
code in sqlserver.ts (which reads result.recordset?.columns) exercises the
correct shape when calling connector.execute("SELECT * FROM t").
🧹 Nitpick comments (3)
packages/opencode/test/altimate/data-diff-cross-dialect.test.ts (1)

114-127: Strengthen the same-warehouse joindiff positive-path assertions.

This test only checks result.success === true, which passes as long as no guard fires — it won’t catch regressions where joindiff silently degrades (e.g., falls back to hashdiff, skips execution, or produces empty SQL). Since the negative test above asserts sqlLog.length === 0, the symmetric positive assertion is cheap and much stronger.

🧪 Suggested additional assertions
     const result = await runDataDiff({
       source: "dbo.orders",
       target: "dbo.orders_v2",
       key_columns: ["id"],
       source_warehouse: "shared",
       target_warehouse: "shared",
       algorithm: "joindiff",
     })
     expect(result.success).toBe(true)
+    // Joindiff must actually dispatch SQL to the shared warehouse
+    expect(sqlLog.length).toBeGreaterThan(0)
+    expect(sqlLog.every((x) => x.warehouse === "shared")).toBe(true)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/test/altimate/data-diff-cross-dialect.test.ts` around lines
114 - 127, The test "same-warehouse joindiff is allowed" currently only checks
result.success; strengthen it by also asserting that the joindiff actually
executed SQL instead of silently degrading: after calling runDataDiff (the test
using Registry.setConfigs and runDataDiff), assert that sqlLog (the request/SQL
capture used in other tests) contains entries (e.g., sqlLog.length > 0) and that
the executed statement or logged payload indicates joindiff behavior (e.g.,
contains "JOIN" or the joindiff-specific marker), so failures where joindiff
falls back or produces no SQL are caught.
packages/drivers/src/sqlserver.ts (2)

170-186: execSync blocks the event loop inside an async path.

The Azure CLI fallback uses synchronous execSync with a 15-second timeout inside acquireAzureToken(), which will stall the entire event loop while waiting for az account get-access-token. Prefer execFile promisified (or spawn) so that other tasks (e.g. in-flight pool requests, other connectors initializing in parallel) continue to progress. Also, passing the command as a single formatted string through execSync is a minor shell-injection concern if resourceUrl is ever derived from untrusted input — execFile with an argv array avoids that entirely.

♻️ Suggested approach
-            try {
-              const { execSync } = await import("node:child_process")
-              const out = execSync(
-                `az account get-access-token --resource ${resourceUrl} --query accessToken -o tsv`,
-                { encoding: "utf-8", timeout: 15000, stdio: ["pipe", "pipe", "pipe"] },
-              ).trim()
+            try {
+              const { execFile } = await import("node:child_process")
+              const { promisify } = await import("node:util")
+              const execFileP = promisify(execFile)
+              const { stdout } = await execFileP(
+                "az",
+                ["account", "get-access-token", "--resource", resourceUrl, "--query", "accessToken", "-o", "tsv"],
+                { encoding: "utf-8", timeout: 15000 },
+              )
+              const out = stdout.trim()
               if (out) {
                 token = out
                 expiresAt = parseTokenExpiry(out)
               }
             } catch (err: any) {
               azCliStderr = String(err?.stderr ?? err?.message ?? "").slice(0, 200).trim()
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/src/sqlserver.ts` around lines 170 - 186, The use of
synchronous execSync inside acquireAzureToken blocks the event loop; replace it
with a non-blocking child process call (e.g., promisified execFile or spawn) to
run "az" with arguments ["account","get-access-token","--resource",
resourceUrl,"--query","accessToken","-o","tsv"], capture stdout/stderr
asynchronously, enforce the 15s timeout via timers or the child process API, and
on success set token and expiresAt (via parseTokenExpiry(out)); on failure
capture stderr/message into azCliStderr (trim to 200 chars) and preserve
existing error-handling flow so behavior of token, expiresAt, and azCliStderr
remains the same.

140-204: Concurrent cache-miss calls will parallel-fetch the same token.

If two connect() calls for the same (resource, clientId) start before either populates the cache, both will independently run DefaultAzureCredential.getToken() (or the az CLI) and both write to the cache. This is wasteful for AAD throttling and surprising during app startup where several diffs connect in parallel. Caching the in-flight Promise<string> (rather than the resolved token) de-duplicates concurrent acquisitions for free — and on rejection you simply evict and let the next caller retry.

const inflight = new Map<string, Promise<string>>()
// inside acquireAzureToken:
const existing = inflight.get(cacheKey)
if (existing) return existing
const p = (async () => { /* existing acquisition body */ })()
  .finally(() => inflight.delete(cacheKey))
inflight.set(cacheKey, p)
return p
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/src/sqlserver.ts` around lines 140 - 204, acquireAzureToken
currently allows concurrent cache-miss callers to run duplicate AAD/az CLI
fetches; introduce an inflight Map<string, Promise<string>> (e.g., inflight)
keyed by cacheKey and at the start of acquireAzureToken return an existing
inflight.get(cacheKey) if present, otherwise wrap the entire acquisition logic
in an async Promise, set inflight.set(cacheKey, promise) before awaiting it, and
in promise.finally() delete the inflight entry so failed attempts can be
retried; keep using tokenCache as before (check cached token first, write
resolved token into tokenCache) and on rejection ensure the inflight entry is
removed so subsequent callers will create a new fetch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 576-596: The "different clientIds cache separately" test currently
only asserts both connections received "shared-token" which doesn't prove
separate caching; modify the test in the test block named "different clientIds
cache separately" to either (1) make azureIdentityState.tokenOverride return
distinct tokens per call (so the first connect yields "token-1" and the second
"token-2") and assert mockConnectCalls[0].authentication.options.token and
mockConnectCalls[1].authentication.options.token differ accordingly, or (2) keep
the shared-token but assert the underlying getToken mock was invoked twice (e.g.
check the getToken invocation counter after resetMocks and the two connect()
calls equals 2) to prove cache keying is per azure_client_id; refer to
azureIdentityState.tokenOverride, resetMocks(), connect(...), and
mockConnectCalls to implement the change.

In `@packages/opencode/test/altimate/data-diff-cross-dialect.test.ts`:
- Around line 27-42: The module-level mocks created with
mock.module("@altimateai/altimate-core", ...) (and the registry mock) can leak
across tests; add cleanup by calling mock.restoreModule() in an afterEach (or
afterAll) to restore original module exports, or replace the module-level
mock.module usage with scoped spyOn calls inside the test suite setup (e.g.,
inside beforeEach) so the mock is not process-scoped; update the test file to
call mock.restoreModule() after each test or convert the mocks to spyOn to
prevent cross-file leakage.

---

Duplicate comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 255-262: The fallback branch that calls mssql.connect()
reintroduces the global shared pool bug; replace the fallback with a hard
failure instead of silently using the global pool: when MssqlConnectionPool is
falsy, throw a descriptive error indicating ConnectionPool is required
(referencing MssqlConnectionPool, ConnectionPool, and the pool initialization
logic) so callers know the driver cannot operate without the explicit
ConnectionPool; do not call mssql.connect() anywhere in this module.
- Around line 302-312: The code currently drops known named headers when some
columns are flattened (variables: limitedRecordset, flattenRow, rows,
sampleFlat, namedKeys, columns), so change the columns construction to preserve
the namedKeys prefix and only synthesize names for any extra flattened entries:
if namedKeys.length === sampleFlat.length use namedKeys; if namedKeys.length <
sampleFlat.length set columns = [...namedKeys,
...sampleFlat.slice(namedKeys.length).map((_, i) => `col_${namedKeys.length +
i}`)]; otherwise (sampleFlat shorter) keep the current fallback logic that uses
sampleFlat or result.recordset?.columns as before. Ensure this logic replaces
the current ternary chain building columns.

In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 176-181: The test constructs a fake mssql result with the wrong
metadata shape—using mockQueryResult.recordset_columns instead of the real mssql
shape result.recordset.columns—so update the test ("empty result returns
correctly" in packages/drivers/test/sqlserver-unit.test.ts) to build
mockQueryResult with recordset: [] and an empty columns object under
recordset.columns (i.e., mockQueryResult.recordset = Object.assign([], {
columns: {} })) so the driver code in sqlserver.ts (which reads
result.recordset?.columns) exercises the correct shape when calling
connector.execute("SELECT * FROM t").

---

Nitpick comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 170-186: The use of synchronous execSync inside acquireAzureToken
blocks the event loop; replace it with a non-blocking child process call (e.g.,
promisified execFile or spawn) to run "az" with arguments
["account","get-access-token","--resource",
resourceUrl,"--query","accessToken","-o","tsv"], capture stdout/stderr
asynchronously, enforce the 15s timeout via timers or the child process API, and
on success set token and expiresAt (via parseTokenExpiry(out)); on failure
capture stderr/message into azCliStderr (trim to 200 chars) and preserve
existing error-handling flow so behavior of token, expiresAt, and azCliStderr
remains the same.
- Around line 140-204: acquireAzureToken currently allows concurrent cache-miss
callers to run duplicate AAD/az CLI fetches; introduce an inflight Map<string,
Promise<string>> (e.g., inflight) keyed by cacheKey and at the start of
acquireAzureToken return an existing inflight.get(cacheKey) if present,
otherwise wrap the entire acquisition logic in an async Promise, set
inflight.set(cacheKey, promise) before awaiting it, and in promise.finally()
delete the inflight entry so failed attempts can be retried; keep using
tokenCache as before (check cached token first, write resolved token into
tokenCache) and on rejection ensure the inflight entry is removed so subsequent
callers will create a new fetch.

In `@packages/opencode/test/altimate/data-diff-cross-dialect.test.ts`:
- Around line 114-127: The test "same-warehouse joindiff is allowed" currently
only checks result.success; strengthen it by also asserting that the joindiff
actually executed SQL instead of silently degrading: after calling runDataDiff
(the test using Registry.setConfigs and runDataDiff), assert that sqlLog (the
request/SQL capture used in other tests) contains entries (e.g., sqlLog.length >
0) and that the executed statement or logged payload indicates joindiff behavior
(e.g., contains "JOIN" or the joindiff-specific marker), so failures where
joindiff falls back or produces no SQL are caught.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: dd24dc82-bab2-4760-ab04-8fc4b312c420

📥 Commits

Reviewing files that changed from the base of the PR and between 1977232 and 872e082.

📒 Files selected for processing (6)
  • packages/drivers/src/normalize.ts
  • packages/drivers/src/sqlserver.ts
  • packages/drivers/test/sqlserver-unit.test.ts
  • packages/opencode/src/altimate/native/connections/data-diff.ts
  • packages/opencode/test/altimate/data-diff-cross-dialect.test.ts
  • packages/opencode/test/altimate/data-diff-cte.test.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/drivers/src/normalize.ts
  • packages/opencode/test/altimate/data-diff-cte.test.ts
  • packages/opencode/src/altimate/native/connections/data-diff.ts

Comment thread packages/drivers/test/sqlserver-unit.test.ts
Comment on lines +27 to +42
mock.module("@altimateai/altimate-core", () => ({
DataParitySession: class {
constructor(specJson: string) { lastSpec = JSON.parse(specJson) }
start() { return fakeStartAction }
step(_responses: string) {
return JSON.stringify({
type: "Done",
outcome: {
mode: "diff",
diff_rows: [],
stats: { rows_table1: 0, rows_table2: 0, exclusive_table1: 0, exclusive_table2: 0, updated: 0, unchanged: 0 },
},
})
}
},
}))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find other tests that import the mocked modules and could observe the stubs.
rg -nP --type=ts -C2 '@altimateai/altimate-core|native/connections/(registry|data-diff)' packages/opencode/test
# Check whether any existing test uses mock.restore / module unmocking for these.
rg -nP --type=ts -C2 'mock\.(restore|module)' packages/opencode/test

Repository: AltimateAI/altimate-code

Length of output: 41367


🌐 Web query:

Bun test mock.module scope process-level isolation across files

💡 Result:

In Bun test runner (as of 2026-04-17), mock.module does not provide automatic process-level isolation across test files. All tests run in a single shared process by default, causing mock.module calls in one test file to affect (leak to) other test files due to module cache sharing. This is a known limitation documented in open GitHub issues #12823 (Bun mocks to be scoped to test file, 63 thumbs up, active as of 2026) and #6024 (Make bun test isolated). Key evidence: - Official docs confirm single-process execution: "The test runner runs all tests in a single process." No full process isolation like Vitest/Jest --isolate. - mock.module modifies the shared ESM/CJS module caches globally, with live bindings propagating changes. - mock.restore clears function mocks/spies but explicitly does NOT restore module mocks (per docs). - Recent PR #25844 (Jan 2026) adds mock.restoreModule for manual restoration (selective or all), intended for afterEach to improve isolation within the shared process. Claims to fix #12823 but PR status unclear (not confirmed merged). - Workarounds: 1. Run test files individually: for file in *.test.ts; do bun test "$file"; done (each spawns separate process). 2. Use mock.restoreModule in afterEach if available in your Bun version. 3. spyOn on specific methods instead of full module mocks (scoped better). 4. Combine related tests into single file with describe blocks. - No --isolate flag exists; experimental --isolate=lite PR #10210 (2024) closed without merge. - For true process-level isolation across files, Bun lacks native support; use external tools or run files separately. To achieve process-level isolation: script sequential per-file execution sacrifices parallelism but ensures clean mocks per process.

Citations:


Add module mock cleanup to prevent leakage across test files.

mock.module("@altimateai/altimate-core", …) and mock.module("…/registry", …) have no cleanup in afterEach or afterAll. In Bun, mock.module replacements are process-scoped (single shared process across all test files). Without cleanup, these stubs can leak to other test files that import @altimateai/altimate-core or the registry (e.g., data-diff-cte.test.ts, data-diff-dialect.test.ts, connections.test.ts), causing those tests to silently observe the mock implementations depending on file execution order.

mock.restore() does not restore module mocks—only function/spy mocks. Use mock.restoreModule() in afterEach (available in Bun v1.1.43+, PR #25844) to restore the mocked modules to their original exports. Alternatively, apply the pattern used in check-e2e.test.ts or post-connect-suggestions.test.ts: replace mock.module with targeted spyOn calls scoped to the specific test suite, or restructure the test to avoid module-level mocks.

Also applies to: 69–82 Same leakage risk for the registry mock at lines 69–82.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/test/altimate/data-diff-cross-dialect.test.ts` around lines
27 - 42, The module-level mocks created with
mock.module("@altimateai/altimate-core", ...) (and the registry mock) can leak
across tests; add cleanup by calling mock.restoreModule() in an afterEach (or
afterAll) to restore original module exports, or replace the module-level
mock.module usage with scoped spyOn calls inside the test suite setup (e.g.,
inside beforeEach) so the mock is not process-scoped; update the test file to
call mock.restoreModule() after each test or convert the mocks to spyOn to
prevent cross-file leakage.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 6 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/drivers/src/sqlserver.ts">

<violation number="1" location="packages/drivers/src/sqlserver.ts:50">
P2: Normalize explicit `azure_resource_url` to include a trailing slash before appending `.default`, otherwise `getToken` can receive an invalid scope string.</violation>
</file>

<file name="packages/opencode/test/altimate/data-diff-cross-dialect.test.ts">

<violation number="1" location="packages/opencode/test/altimate/data-diff-cross-dialect.test.ts:97">
P2: Use the same dialect on both warehouses here. As written, this only covers mixed-dialect diffing and won't catch same-dialect cross-warehouse joindiff regressions.</violation>
</file>

<file name="packages/opencode/src/altimate/native/connections/data-diff.ts">

<violation number="1" location="packages/opencode/src/altimate/native/connections/data-diff.ts:870">
P1: `crossWarehouse` is derived from dialect equality, so joindiff is incorrectly allowed across different warehouses that use the same dialect.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread packages/opencode/src/altimate/native/connections/data-diff.ts Outdated
*/
function resolveAzureResourceUrl(config: ConnectionConfig): string {
const explicit = config.azure_resource_url as string | undefined
if (explicit) return explicit
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Normalize explicit azure_resource_url to include a trailing slash before appending .default, otherwise getToken can receive an invalid scope string.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/drivers/src/sqlserver.ts, line 50:

<comment>Normalize explicit `azure_resource_url` to include a trailing slash before appending `.default`, otherwise `getToken` can receive an invalid scope string.</comment>

<file context>
@@ -4,6 +4,65 @@
+ */
+function resolveAzureResourceUrl(config: ConnectionConfig): string {
+  const explicit = config.azure_resource_url as string | undefined
+  if (explicit) return explicit
+  const host = (config.host as string | undefined) ?? ""
+  if (host.includes(".usgovcloudapi.net") || host.includes(".datawarehouse.fabric.microsoft.us")) {
</file context>
Suggested change
if (explicit) return explicit
if (explicit) return explicit.endsWith("/") ? explicit : `${explicit}/`
Fix with Cubic

test("returns early error when joindiff + cross-warehouse", async () => {
Registry.setConfigs({
src: { type: "sqlserver", host: "s1", database: "d" },
tgt: { type: "postgres", host: "s2", database: "d" },
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Use the same dialect on both warehouses here. As written, this only covers mixed-dialect diffing and won't catch same-dialect cross-warehouse joindiff regressions.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/test/altimate/data-diff-cross-dialect.test.ts, line 97:

<comment>Use the same dialect on both warehouses here. As written, this only covers mixed-dialect diffing and won't catch same-dialect cross-warehouse joindiff regressions.</comment>

<file context>
@@ -0,0 +1,168 @@
+  test("returns early error when joindiff + cross-warehouse", async () => {
+    Registry.setConfigs({
+      src: { type: "sqlserver", host: "s1", database: "d" },
+      tgt: { type: "postgres", host: "s2", database: "d" },
+    })
+    const result = await runDataDiff({
</file context>
Fix with Cubic

…lot)

Addresses the remaining issues raised by coderabbitai, cubic-dev-ai, and the
Copilot PR reviewer on top of the multi-model consensus fix.

### CRITICAL

- **`@azure/identity` peer dep removed** (`drivers/package.json`)
  `mssql@12` → `tedious@19` bundles `@azure/identity ^4.2.1` as a regular
  dependency. Declaring it here as an optional peer was redundant and caused
  transitive-version-drift concerns. Users get the correct version
  automatically through the tedious chain; our lazy import handles the
  browser-bundle edge case itself.

### MAJOR

- **Cross-dialect date partition literal normalization** (`data-diff.ts`)
  `buildPartitionDiscoverySQL` on MSSQL returns a JS `Date` object, stringified
  upstream as `"Mon Jan 01 2024 …"`. `CONVERT(DATE, …, 23)` rejects that format.
  Normalize `partitionValue` to ISO `yyyy-mm-dd` before dialect casting so the
  T-SQL/Fabric path works end-to-end on dates discovered from MSSQL sources.

- **`crossWarehouse` uses resolved warehouse identity** (`data-diff.ts`)
  Previous commit gated on dialect compare, which treated two independent
  MSSQL instances as "same warehouse" and would have let `joindiff` route a
  JOIN through a warehouse that can't resolve the other side's base tables.
  Now resolves both sides' warehouse name (falling back to the default
  warehouse when a side is omitted) and compares identities — identity-based
  gating handles both the "undefined vs default" case (cubic) and the
  "same-dialect, different instance" case (Copilot).

- **Drop `mssql.connect()` fallback** (`sqlserver.ts`)
  `mssql@^12` guarantees `ConnectionPool` as a named export. The fallback
  silently re-introduced the global-shared-pool bug this branch was added
  to fix. Now throws a descriptive error if `ConnectionPool` is missing —
  cross-database pool interference cannot regress.

- **Non-string `config.authentication` guarded** (`sqlserver.ts`)
  Caller passing a pre-built `{ type, options }` block (or `null`) previously
  crashed with `TypeError: rawAuth.toLowerCase is not a function`. Now only
  applies the shorthand lookup when `rawAuth` is a string; other values pass
  through so tedious can handle them or reject them with its own error.

- **Unknown `azure-active-directory-*` subtype fails fast** (`sqlserver.ts`)
  Typos or future tedious subtypes previously dropped through all `else if`
  branches, producing a config with `encrypt: true` but no `authentication`
  block. tedious then surfaced an opaque error far from the root cause.
  Now throws with the offending subtype and the supported list.

- **`execSync` replaced with async `exec`** (`sqlserver.ts`)
  The `az account get-access-token` CLI fallback previously blocked the
  event loop for up to 15s. Switched to `util.promisify(exec)` so the
  connection path stays non-blocking.

- **Mixed named + unnamed column derivation preserves headers** (`sqlserver.ts`)
  Previously `SELECT name, COUNT(*), SUM(x)` produced either `["name", ""]`
  (blank header) or `["col_0", "col_1", "col_2"]` (lost `name`). Rewrote
  column/row derivation to iterate in one pass, preserving known named
  columns and synthesizing `col_N` only for expanded `""`-key positions.

### MINOR

- **`(no values)` fallback for empty `diff_row.values` array** (`tools/data-diff.ts`)
  `[].join(" | ") ?? "(no values)"` never fires because `""` is falsy-but-not-
  nullish. Gate on `d.values?.length` instead.

### Test / docs

- `sqlserver-unit.test.ts`: token-cache client-id test now counts actual
  `getToken` invocations (previous version only verified both got the same
  mocked token, which proved nothing about keying).
- `sqlserver-unit.test.ts`: "empty result" test now mirrors the real mssql
  shape (`recordset.columns` is a property *on* the recordset array, not a
  sibling key).
- `sqlserver-unit.test.ts`: added mixed-column regression tests — "name +
  COUNT + SUM" and "single unnamed column" — to lock in the derivation fix.
- `sqlserver-unit.test.ts`: stubbed async `exec` via `util.promisify.custom`
  so tests drive both the `execSync` legacy path and the new async path.
- `SKILL.md`: Fabric config fenced block now declares `yaml` (markdownlint
  MD040).

All tests: 43/43 sqlserver driver + 238/238 opencode test suite.

Attribution: findings identified by coderabbitai, cubic-dev-ai, and the
Copilot PR reviewer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
packages/drivers/src/sqlserver.ts (1)

147-218: Consider single-flight dedup for concurrent cold-cache token acquisitions.

If two (or many) connect() calls for the same (resourceUrl, clientId) race before the first token lands in the cache, each call runs the full DefaultAzureCredentialaz path independently — exactly the "wasteful, risks throttling" scenario the cache comment flags. A small in-flight map (Map<string, Promise<string>>) next to tokenCache collapses concurrent cold requests to a single network roundtrip.

♻️ Suggested approach
 const tokenCache = new Map<string, { token: string; expiresAt: number }>()
+const inflightTokenRequests = new Map<string, Promise<string>>()

Then in acquireAzureToken:

         const acquireAzureToken = async (): Promise<string> => {
           const cached = tokenCache.get(cacheKey)
           if (cached && cached.expiresAt - Date.now() > TOKEN_REFRESH_MARGIN_MS) {
             return cached.token
           }
+          const existing = inflightTokenRequests.get(cacheKey)
+          if (existing) return existing
+          const p = (async () => {
+            // ... existing acquisition + cache-set logic ...
+            return token!
+          })().finally(() => inflightTokenRequests.delete(cacheKey))
+          inflightTokenRequests.set(cacheKey, p)
+          return p
         }

Low priority — only matters when multiple pools spin up concurrently at process start — but it's cheap insurance for the documented motivation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/src/sqlserver.ts` around lines 147 - 218, acquireAzureToken
can run multiple concurrent cold-cache fetches; add a single-flight map (e.g.
const inFlight = new Map<string, Promise<string>>()) alongside tokenCache and
use cacheKey as the key: at start of acquireAzureToken check if
inFlight.has(cacheKey) and return the stored Promise; if not, create the Promise
that performs the existing token acquisition logic (DefaultAzureCredential path
then az CLI fallback), store it in inFlight before awaiting internals, and
ensure you remove(inFlight) in both fulfillment and rejection handlers so
failures don't permanently block retries; keep existing tokenCache.set and
TOKEN_REFRESH_MARGIN_MS logic unchanged.
packages/drivers/test/sqlserver-unit.test.ts (1)

119-131: Optional: drop the execSync mock branch now that production uses async exec.

The driver no longer calls execSync (it was replaced with promisify(exec) in packages/drivers/src/sqlserver.ts Line 183-189). The execSync stub here is dead mock-surface — keeping it risks implying execSync is still a supported fallback path in the driver.

♻️ Suggested cleanup
 mock.module("node:child_process", () => ({
   ...realChildProcess,
-  execSync: (cmd: string) => {
-    cliState.lastCmd = cmd
-    if (cliState.throwError) {
-      const e: any = new Error(cliState.throwError.message ?? "az failed")
-      e.stderr = cliState.throwError.stderr
-      throw e
-    }
-    return cliState.output
-  },
   exec: execStub,
 }))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/test/sqlserver-unit.test.ts` around lines 119 - 131, Remove
the dead execSync branch from the test mock: in the mock.module call that
currently spreads realChildProcess and defines execSync and exec, delete the
execSync property and its closure (the cliState handling and thrown Error
creation) so only exec: execStub remains; keep the realChildProcess spread and
existing execStub usage (referencing mock.module, execStub, and cliState) to
reflect that production now uses the async promisified exec path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 186-200: The current call uses execAsync with a shell-interpolated
command including resourceUrl (user input), risking shell injection; replace
that usage by invoking execFile (or your promisified execFileAsync) to call "az"
with args ["account", "get-access-token", "--resource", resourceUrl, "--query",
"accessToken", "-o", "tsv"] so the value is passed as an argv element (no
shell), retain the same options (encoding, timeout) and preserve the existing
handling of stdout -> out -> token/expiresAt and azCliStderr in the catch; also
update the related test stub to mock execFile/execFileAsync instead of execAsync
or assert the argv array.

In `@packages/opencode/src/altimate/native/connections/data-diff.ts`:
- Around line 810-820: runPartitionedDiff currently replaces
params.source/params.target with SELECT subqueries and then calls runDataDiff,
which prevents discoverExtraColumns from recognizing real table names and causes
extra_columns to be lost; fix this by running discoverExtraColumns (and
capturing discoveredExcluded) inside runPartitionedDiff while
params.source/params.target are still plain table names, then pass the
discovered extra_columns into the recursive runDataDiff call (e.g., set
params.extra_columns and params.excluded_audit_columns or forward
discoveredExcluded) instead of letting runDataDiff attempt discovery on the
SELECT strings; ensure you do not set partition_column in the recursive call as
before.

---

Nitpick comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 147-218: acquireAzureToken can run multiple concurrent cold-cache
fetches; add a single-flight map (e.g. const inFlight = new Map<string,
Promise<string>>()) alongside tokenCache and use cacheKey as the key: at start
of acquireAzureToken check if inFlight.has(cacheKey) and return the stored
Promise; if not, create the Promise that performs the existing token acquisition
logic (DefaultAzureCredential path then az CLI fallback), store it in inFlight
before awaiting internals, and ensure you remove(inFlight) in both fulfillment
and rejection handlers so failures don't permanently block retries; keep
existing tokenCache.set and TOKEN_REFRESH_MARGIN_MS logic unchanged.

In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 119-131: Remove the dead execSync branch from the test mock: in
the mock.module call that currently spreads realChildProcess and defines
execSync and exec, delete the execSync property and its closure (the cliState
handling and thrown Error creation) so only exec: execStub remains; keep the
realChildProcess spread and existing execStub usage (referencing mock.module,
execStub, and cliState) to reflect that production now uses the async
promisified exec path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 2c693281-52dc-47f7-b49c-a79a1d7331cd

📥 Commits

Reviewing files that changed from the base of the PR and between 872e082 and 38cfb0e.

📒 Files selected for processing (6)
  • .opencode/skills/data-parity/SKILL.md
  • packages/drivers/package.json
  • packages/drivers/src/sqlserver.ts
  • packages/drivers/test/sqlserver-unit.test.ts
  • packages/opencode/src/altimate/native/connections/data-diff.ts
  • packages/opencode/src/altimate/tools/data-diff.ts
✅ Files skipped from review due to trivial changes (2)
  • packages/drivers/package.json
  • .opencode/skills/data-parity/SKILL.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/opencode/src/altimate/tools/data-diff.ts

Comment on lines +186 to +200
const { stdout } = await execAsync(
`az account get-access-token --resource ${resourceUrl} --query accessToken -o tsv`,
{ encoding: "utf-8", timeout: 15000 },
)
const out = stdout.trim()
if (out) {
token = out
expiresAt = parseTokenExpiry(out)
}
} catch (err: any) {
// Capture stderr so the final error message can hint at the root cause
// (e.g. "Please run 'az login'", "subscription not found").
azCliStderr = String(err?.stderr ?? err?.message ?? "").slice(0, 200).trim()
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Prefer execFile over shell interpolation for az account get-access-token.

resourceUrl flows from config.azure_resource_url (user-supplied) directly into a shell command string. In the common case it's a well-formed URL, but a malformed/hostile value such as https://x/;other-cmd would be interpreted by the shell. Since the az invocation doesn't actually need a shell, execFile with arg array sidesteps both injection and quoting concerns and also keeps timeout/stderr semantics.

🛡️ Suggested change
-              const childProcess = await import("node:child_process")
-              const { promisify } = await import("node:util")
-              const execAsync = promisify(childProcess.exec)
-              const { stdout } = await execAsync(
-                `az account get-access-token --resource ${resourceUrl} --query accessToken -o tsv`,
-                { encoding: "utf-8", timeout: 15000 },
-              )
+              const childProcess = await import("node:child_process")
+              const { promisify } = await import("node:util")
+              const execFileAsync = promisify(childProcess.execFile)
+              const { stdout } = await execFileAsync(
+                "az",
+                ["account", "get-access-token", "--resource", resourceUrl, "--query", "accessToken", "-o", "tsv"],
+                { encoding: "utf-8", timeout: 15000 },
+              )

Note: the corresponding test stub at packages/drivers/test/sqlserver-unit.test.ts line 98 would need to also stub execFile (or switch the test to assert on execFileAsync's args array).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/src/sqlserver.ts` around lines 186 - 200, The current call
uses execAsync with a shell-interpolated command including resourceUrl (user
input), risking shell injection; replace that usage by invoking execFile (or
your promisified execFileAsync) to call "az" with args ["account",
"get-access-token", "--resource", resourceUrl, "--query", "accessToken", "-o",
"tsv"] so the value is passed as an argv element (no shell), retain the same
options (encoding, timeout) and preserve the existing handling of stdout -> out
-> token/expiresAt and azCliStderr in the catch; also update the related test
stub to mock execFile/execFileAsync instead of execAsync or assert the argv
array.

Comment on lines +810 to 820
const sourceSql = `SELECT * FROM ${sourceTableRef} WHERE ${sourcePartWhere}`
const targetSql = `SELECT * FROM ${targetTableRef} WHERE ${targetPartWhere}`

const result = await runDataDiff({
...params,
where_clause: fullWhere,
source: sourceSql,
target: targetSql,
// Preserve the user's shared where_clause — it's dialect-neutral.
where_clause: params.where_clause,
partition_column: undefined, // prevent recursion
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Partitioned diff silently regresses extra_columns auto-discovery to key-only.

After this rework, runPartitionedDiff replaces params.source/params.target with SELECT * FROM … subqueries before recursing into runDataDiff. Inside that recursive call, the auto-discovery at lines 938–949 invokes discoverExtraColumns(params.source, …), which bails out here because the "table name" now starts with SELECT:

// line 444
if (SQL_KEYWORDS.test(tableName)) return undefined

extraColumns stays empty, spec.config.extra_columns becomes [], and — per the comment at lines 932–934 — the Rust engine then compares key existence only and reports rows as "identical" even when non-key values differ. So whenever a user supplies partition_column without explicitly listing extra_columns, per-partition diffs silently lose value comparison.

Do the discovery once in runPartitionedDiff (where params.source is still a plain table name — already guarded at line 701) and forward the result into the recursive call:

Suggested fix
   const sourceDialect = resolveDialect(params.source_warehouse)
   const targetDialect = resolveDialect(params.target_warehouse ?? params.source_warehouse)
   const { table1Name, table2Name } = resolveTableSources(
     params.source,
     params.target,
     sourceDialect,
     targetDialect,
   )

+  // Auto-discover extra_columns up front so the recursive runDataDiff call
+  // (which sees a SELECT subquery as params.source) doesn't silently fall
+  // back to key-only comparison.
+  let discoveredExtra = params.extra_columns
+  let discoveredExcluded: string[] | undefined
+  if (!discoveredExtra || discoveredExtra.length === 0) {
+    const discovered = await discoverExtraColumns(
+      params.source,
+      params.key_columns,
+      sourceDialect,
+      params.source_warehouse,
+    )
+    if (discovered) {
+      discoveredExtra = discovered.columns
+      discoveredExcluded = discovered.excludedAudit
+    }
+  }
   ...
     const result = await runDataDiff({
       ...params,
       source: sourceSql,
       target: targetSql,
       where_clause: params.where_clause,
+      extra_columns: discoveredExtra,
       partition_column: undefined, // prevent recursion
     })

(You may also want to surface discoveredExcluded as excluded_audit_columns on the aggregated result for parity with the non-partitioned path.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/native/connections/data-diff.ts` around lines
810 - 820, runPartitionedDiff currently replaces params.source/params.target
with SELECT subqueries and then calls runDataDiff, which prevents
discoverExtraColumns from recognizing real table names and causes extra_columns
to be lost; fix this by running discoverExtraColumns (and capturing
discoveredExcluded) inside runPartitionedDiff while params.source/params.target
are still plain table names, then pass the discovered extra_columns into the
recursive runDataDiff call (e.g., set params.extra_columns and
params.excluded_audit_columns or forward discoveredExcluded) instead of letting
runDataDiff attempt discovery on the SELECT strings; ensure you do not set
partition_column in the recursive call as before.

Commit 38cfb0e removed `@azure/identity` from the drivers package's
`peerDependencies` (tedious already bundles it), but the lockfile's
`packages/drivers` workspace section still carried the corresponding
`peerDependencies` and `optionalPeers` blocks. CI running
`bun install --frozen-lockfile` would fail on the drift.

Minimal edit — just removes the two stale blocks. No resolution changes
(`bun install --frozen-lockfile` passes with "no changes").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 7 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/altimate/native/connections/data-diff.ts">

<violation number="1" location="packages/opencode/src/altimate/native/connections/data-diff.ts:605">
P1: The new date parsing rewrites MySQL/MariaDB week partition values (`YYYY-WW`) into `YYYY-MM-DD`, causing partition WHERE clauses that never match.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment on lines +605 to +606
const d = new Date(trimmed)
return Number.isNaN(d.getTime()) ? trimmed : d.toISOString().slice(0, 10)
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: The new date parsing rewrites MySQL/MariaDB week partition values (YYYY-WW) into YYYY-MM-DD, causing partition WHERE clauses that never match.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/native/connections/data-diff.ts, line 605:

<comment>The new date parsing rewrites MySQL/MariaDB week partition values (`YYYY-WW`) into `YYYY-MM-DD`, causing partition WHERE clauses that never match.</comment>

<file context>
@@ -591,7 +591,21 @@ function buildPartitionWhereClause(
+    // (e.g. from Postgres, BigQuery, DATE_FORMAT MySQL output) flow through
+    // without surprising timezone shifts.
+    if (/^\d{4}-\d{2}-\d{2}(\s|T|$)/.test(trimmed)) return trimmed.slice(0, 10)
+    const d = new Date(trimmed)
+    return Number.isNaN(d.getTime()) ? trimmed : d.toISOString().slice(0, 10)
+  })()
</file context>
Suggested change
const d = new Date(trimmed)
return Number.isNaN(d.getTime()) ? trimmed : d.toISOString().slice(0, 10)
if (!/[A-Za-z]/.test(trimmed)) return trimmed
const d = new Date(trimmed)
return Number.isNaN(d.getTime()) ? trimmed : d.toISOString().slice(0, 10)
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants